Neural forecasting models¶
We will cover neural models for time series forecasting, both trained from scratch and pretrained. We will use varied libraries, depending on the model, for example:
- sktime - general time series processing
- neuralforecast - a lot of neural models for time series, e.g. DLinear, N-BEATS
- PyTorch - deep learning framework
- timesfm - official TimesFM implementation (and loading pretrained model)
Use tutorials, quickstarts, GitHub pages etc. of those libraries as necessary.
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=UserWarning)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
Datasets and evaluation¶
We will use 2 datasets:
- Italian pasta dataset, same as in the first notebook.
- Polish energy production data, as published by Energy Instrat and ENTSO-e, from data by PSE (Polskie Sieci Elektroenergetyczne).
Both are multivariate and focused on long-term forecasting.
Italian pasta¶
Data loading and visualization¶
This dataset technically multivariate, but it has data from 4 different companies with very different characteristics, so it may have pretty weak cross-series dependencies. We will consider a simplified variant with no exogenous variables.
from sktime.utils.plotting import plot_series
df_pasta = pd.read_csv("italian_pasta.csv")
for num in [1, 2, 3, 4]:
company_qty_cols = [col for col in df_pasta.columns if col.startswith(f"QTY_B{num}")]
df_pasta[f"value_B{num}"] = df_pasta[company_qty_cols].sum(axis="columns")
df_pasta = df_pasta.set_index(pd.to_datetime(df_pasta["DATE"])).asfreq("d")
df_pasta = df_pasta[["value_B1", "value_B2", "value_B3", "value_B4"]]
df_pasta
| value_B1 | value_B2 | value_B3 | value_B4 | |
|---|---|---|---|---|
| DATE | ||||
| 2014-01-02 | 101.0 | 186.0 | 32.0 | 36.0 |
| 2014-01-03 | 136.0 | 248.0 | 32.0 | 27.0 |
| 2014-01-04 | 162.0 | 264.0 | 94.0 | 45.0 |
| 2014-01-05 | 106.0 | 106.0 | 18.0 | 29.0 |
| 2014-01-06 | 47.0 | 54.0 | 9.0 | 11.0 |
| ... | ... | ... | ... | ... |
| 2018-12-27 | 203.0 | 143.0 | 30.0 | 39.0 |
| 2018-12-28 | 192.0 | 187.0 | 28.0 | 44.0 |
| 2018-12-29 | 158.0 | 217.0 | 44.0 | 48.0 |
| 2018-12-30 | 182.0 | 211.0 | 40.0 | 27.0 |
| 2018-12-31 | 243.0 | 206.0 | 48.0 | 42.0 |
1825 rows × 4 columns
for num in [1, 2, 3, 4]:
plot_series(df_pasta[f"value_B{num}"], colors = ['royalblue'], title=f"Pasta sales, business {num}")
Evaluation¶
Similarly to the first notebook, we will be interested in long-term forecasting, predicting the daily sales for 2018, based on previous years. Since we have 4 time series with different scales, MASE is a great metric, since it can be averaged across series.
from sktime.transformations.series.impute import Imputer
df_pasta_train = df_pasta[df_pasta.index < "2018-01-01"]
df_pasta_test = df_pasta[df_pasta.index >= "2018-01-01"]
imputer = Imputer(method="ffill")
df_pasta_train = imputer.fit_transform(df_pasta_train)
df_pasta_test = imputer.transform(df_pasta_test)
print(f"Data size: train {len(df_pasta_train)}, test {len(df_pasta_test)}")
Data size: train 1460, test 365
Polish energy production¶
Data loading and visualization¶
Energy mix is composed of multiple energy sources. It typically consists of multiple components:
- slow-changing base, e.g. coal, nuclear
- faster changing and controllable sources, e.g. gas, oil, hydro
- very cheap, but uncontrollably changing renewables, e.g. wind, solar
The resulting production is always limited by the grid efficiency, which is very low in Poland, resulting in e.g. refusing to connect more prosumer solar installations. As such, the production limits are monitored and controlled, and cross-series dependencies are often quite strong.
We will aggregate the energy sources a bit, and consider:
- coal (and derivatives)
- hydro (from all sources)
- solar
- wind
- all others, e.g. oil (petroleum), biomass
Since units are GWh (10^9 Wh, Whatt hours), values are very high, so we will consider thousands of GWh, i.e. TWh (10^12 Wh). It is not a standard unit, but should help with numerical stability for methods that do not perform standardization or scaling.
Data from PSE has changed its format and processing at 13.06.2024, and values since this date are in 15-minutes intervals, compared to 1-hour from before. As such, we divide them by 4, to have the same unit.
If you want to know more about energy production and demand, see e.g. this video or this video.
df_energy = pd.read_csv("electricity_production_entsoe_all.csv")
df_energy = df_energy.drop(columns="date_utc")
df_energy["date"] = pd.to_datetime(df_energy["date"], format="%d.%m.%Y %H:%M")
df_energy = df_energy.set_index("date")
df_energy = df_energy.resample("D").sum()
# aggregate energy sources
df_energy["coal"] = (
df_energy["hard_coal"] + df_energy["coal-derived"] + df_energy["lignite"]
)
df_energy["hydro"] = (
df_energy["hydro_pumped_storage"] +
df_energy["hydro_run-of-river_and_poundage"] +
df_energy["hydro_water_reservoir"]
)
df_energy["wind"] = df_energy["wind_onshore"]
df_energy["other"] = (
df_energy["oil"] +
df_energy["biomass"] +
df_energy["other"] +
df_energy["other_renewable"]
)
df_energy = df_energy[["coal", "gas", "hydro", "wind", "solar", "other"]]
# fix values and change units (GWh -> thousands of GWh)
df_energy[df_energy.index >= "13.06.2024"] /= 4
df_energy["other"][df_energy.index >= "13.06.2024"] /= 2
df_energy = df_energy / 1000
df_energy
| coal | gas | hydro | wind | solar | other | |
|---|---|---|---|---|---|---|
| date | ||||||
| 2015-01-02 | 298.90400 | 10.13800 | 9.25500 | 77.61100 | 0.0000 | 5.702000 |
| 2015-01-03 | 288.79200 | 10.25900 | 9.30400 | 80.98500 | 0.0000 | 5.155000 |
| 2015-01-04 | 271.66200 | 9.89600 | 7.83800 | 77.09600 | 0.0000 | 4.175000 |
| 2015-01-05 | 367.48300 | 10.03000 | 7.09000 | 43.30900 | 0.0000 | 5.627000 |
| 2015-01-06 | 343.57200 | 10.26700 | 7.06800 | 18.49900 | 0.0000 | 5.631000 |
| ... | ... | ... | ... | ... | ... | ... |
| 2024-10-30 | 272.83625 | 68.80675 | 4.48175 | 87.23000 | 13.5885 | 16.192750 |
| 2024-10-31 | 252.18300 | 60.90650 | 4.63500 | 111.13200 | 20.7190 | 16.189625 |
| 2024-11-01 | 158.76150 | 43.92125 | 5.50675 | 169.88575 | 23.0975 | 14.869250 |
| 2024-11-02 | 190.79925 | 47.28800 | 5.18575 | 88.41450 | 45.9005 | 15.769750 |
| 2024-11-03 | 138.62925 | 38.76325 | 3.76750 | 71.03375 | 32.5945 | 11.860500 |
3594 rows × 6 columns
from sktime.utils.plotting import plot_series
plot_series(df_energy.sum(axis="columns"), colors = ['royalblue'], title=f"Total energy production")
for col in df_energy.columns:
plot_series(df_energy[col], colors = ['royalblue'], title=f"{col.capitalize()} energy production")
Evaluation¶
We will perform long-term forecasting, which is a common task on energy production and demand datasets. We will predict production for 2024, using MASE metric.
from sktime.transformations.series.impute import Imputer
df_energy_train = df_energy[df_energy.index < "2024-01-01"]
df_energy_test = df_energy[df_energy.index >= "2024-01-01"]
print(f"Data size: train {len(df_energy_train)}, test {len(df_energy_test)}")
Data size: train 3286, test 308
Forecasting¶
The sub-sections are independent, and can be implemented in any order. The more you do, the more points (and hence the higher mark) you get. They are also more freeform than previous notebook, and there are more options to choose from.
When tuning hyperparameters, choose any strategy you think is reasonable, taking into consideration computational cost and model complexity. Temporal train-valid-test split, time split CV, expanding window - your choice. Even manual tuning is ok, if you think it makes sense, but remember to use the validation set.
You can use any framework and tool you want, but suggestions are provided in further sections. Install additional dependencies as needed, either using Poetry and recreating poetry lock, or by directly using !pip install ....
Training and evaluating more models from particular category can get you more points, as described below. If you prefer, you can also experiment with other models, e.g. RNNs, CNN-based, or state-space models (SSMs), adding further sections. Each one by default is worth 2 points.
Warning: when making this notebook, some errors with neuralforecast cropped up when horizon was greater than 292 for Italian pasta dataset. You can cut the test set at 292 if necessary.
Note that some frameworks (e.g. neuralforecast) require "tall"/"long" time series representation, with columns: unique_id (time series identifier), ds (date) and y (actual value). This is in contrast to the "wide" representation, where we have individual series in separate columns, each row with separate date, and values in cells. See e.g. neuralforecast quickstart for an example. Functions prepared below may be useful.
from typing import Optional
def wide_to_long_df(df: pd.DataFrame) -> pd.DataFrame:
df = pd.melt(df, ignore_index=False).reset_index(names="date")
df = df.rename(columns={"variable": "unique_id", "date": "ds", "value": "y"})
return df
def long_to_wide_df(df: pd.DataFrame, values_col: Optional[str] = None) -> pd.DataFrame:
if "unique_id" not in df.columns:
df = df.reset_index(names="unique_id")
values_col = values_col if values_col else df.columns[-1]
df = pd.pivot(df, columns="unique_id", index="ds", values=values_col)
return df
Baselines (2 points)¶
Implement baselines for neural models:
- last value (naive)
- average
- AutoARIMA
- AutoETS (with damped trend)
Each dataset is worth 1 point. sktime will be useful.
from sktime.forecasting.base import ForecastingHorizon
from sktime.performance_metrics.forecasting import mean_absolute_scaled_error
def evaluate_model(
model,
df_train: pd.DataFrame,
df_test: pd.DataFrame,
plot_forecasts: bool = False,
) -> None:
fh = ForecastingHorizon(np.arange(1, len(df_test) + 1), is_relative = True)
model.fit(y = df_train, fh = fh)
df_pred = model.predict(fh = fh)
mase = mean_absolute_scaled_error(df_test, df_pred, y_train = df_train)
print(f'MASE: {mase:.2f}')
if plot_forecasts:
df = pd.concat([df_train, df_test])
for col in df_train.columns:
plot_series(df[col], df_pred[col], colors = ['royalblue', 'orangered'], labels = ['y', 'y_pred'], title = f'Predicted {col}')
plt.show()
plt.clf()
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.statsforecast import StatsForecastAutoARIMA, StatsForecastAutoETS
last_forecaster = NaiveForecaster(strategy = 'last')
mean_forecaster = NaiveForecaster(strategy = 'mean')
auto_arima = StatsForecastAutoARIMA(seasonal = False)
auto_ets = StatsForecastAutoETS(season_length = 1, model = 'ZZZ', damped = True)
models = [last_forecaster, mean_forecaster, auto_arima, auto_ets]
models_names = ['Last Forecaster', 'Mean Forecaster', 'AutoARIMA', 'AutoETS']
baselines_dict = dict(zip(models_names, models))
def perform_forecasting(train_df: pd.DataFrame, test_df: pd.DataFrame, models_dict: dict, plot_forecasts: bool) -> None:
print(f'Performing forecasting for 1-year horizons\n{"=" * 100}')
for model_name, model in models_dict.items():
print(f'Model: {model_name}')
evaluate_model(model, train_df, test_df, plot_forecasts = plot_forecasts)
print('=' * 100)
perform_forecasting(df_pasta_train, df_pasta_test, baselines_dict, plot_forecasts = True)
Performing forecasting for 1-year horizons ==================================================================================================== Model: Last Forecaster MASE: 0.85
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: Mean Forecaster MASE: 1.10
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: AutoARIMA MASE: 1.11
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: AutoETS MASE: 0.94
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
====================================================================================================
<Figure size 640x480 with 0 Axes>
perform_forecasting(df_energy_train, df_energy_test, baselines_dict, plot_forecasts = True)
Performing forecasting for 1-year horizons ==================================================================================================== Model: Last Forecaster MASE: 2.36
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: Mean Forecaster MASE: 3.44
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: AutoARIMA MASE: 2.77
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: AutoETS MASE: 3.17
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
====================================================================================================
<Figure size 640x480 with 0 Axes>
Linear models (2 points)¶
Implement linear neural models:
- multioutput linear regression
- LTSF Linear
- LTSF DLinear
- LTSF NLinear
Note that Linear is a multi-channel model, while multioutput linear regression is single-channel.
Tune the lookback window, the only hyperparameter of those models, or justify your choice in a comment if you don't. You can check the papers for reasonable values.
If you use a given model, train it on both datasets. Each model is worth 0.5 points. Useful libraries: sktime, neuralforecast, PyTorch.
Useful references:
From this point in the project, I am conducting hyperparameter tuning for each model using information gathered from the linked articles. If certain parameters were not found or the computational cost was too high, I made adjustments to the parameters.
from sktime.forecasting.ltsf import (
LTSFLinearForecaster,
LTSFDLinearForecaster,
LTSFNLinearForecaster
)
import logging
import optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)
def objective(trial, model, df_train, df_test):
seq_len = trial.suggest_categorical('seq_len', [24, 48, 72, 96, 120, 144, 168, 192, 336, 504, 672, 720])
fh = ForecastingHorizon(np.arange(1, len(df_test) + 1), is_relative = True)
model_instance = model(seq_len = seq_len, pred_len = fh)
model_instance.fit(y = df_train, fh = fh)
df_pred = model_instance.predict(fh = fh)
mase = mean_absolute_scaled_error(y_true = df_test, y_pred = df_pred, y_train = df_train)
return mase
def tune_lookback_window(df_train: pd.DataFrame, df_test: pd.DataFrame, text:str) -> None:
linear_neural_models = {
'LTSF Linear': LTSFLinearForecaster,
'LTSF DLinear': LTSFDLinearForecaster,
'LTSF NLinear': LTSFNLinearForecaster
}
print(f'Hyperparameter tuning for {text} dataset\n{"=" * 45}')
for model_name, model in linear_neural_models.items():
print(f'Model: {model_name}')
study = optuna.create_study(direction = 'minimize')
study.optimize(lambda trial: objective(trial, model, df_train, df_test), n_trials = 100)
print(f"Best params: {study.best_params}, MASE: {study.best_value:.2f}\n{"=" * 45}")
tune_lookback_window(df_pasta_train, df_pasta_test, 'Italian pasta')
Hyperparameter tuning for Italian pasta dataset
=============================================
Model: LTSF Linear
Best params: {'seq_len': 24}, MASE: 0.87
=============================================
Model: LTSF DLinear
Best params: {'seq_len': 24}, MASE: 0.88
=============================================
Model: LTSF NLinear
Best params: {'seq_len': 24}, MASE: 0.95
=============================================
from sklearn.linear_model import LinearRegression
from sktime.forecasting.compose import make_reduction
multioutput_linear_regression = make_reduction(LinearRegression(), strategy = 'multioutput')
lstf_linear = LTSFLinearForecaster(seq_len = 24, pred_len = df_pasta_test.shape[0])
lstf_dlinear = LTSFDLinearForecaster(seq_len = 24, pred_len = df_pasta_test.shape[0])
lstf_nlinear = LTSFNLinearForecaster(seq_len = 24, pred_len = df_pasta_test.shape[0])
models = [multioutput_linear_regression, lstf_linear, lstf_dlinear, lstf_nlinear]
models_names = ['multioutput linear regression', 'LTSF Linear', 'LTSF DLinear', 'LTSF NLinear']
linear_models_dict = dict(zip(models_names, models))
def perform_forecasting(train_df: pd.DataFrame, test_df: pd.DataFrame, models_dict: dict, plot_forecasts: bool) -> None:
print(f'Performing forecasting for 1-year horizons\n{"=" * 100}')
for model_name, model in models_dict.items():
print(f'Model: {model_name}')
if model_name == 'multioutput linear regression':
for col in train_df.columns:
train_col_df = train_df[[col]]
test_col_df = test_df[[col]]
evaluate_model(model, train_col_df, test_col_df, plot_forecasts)
print('=' * 100)
else:
evaluate_model(model, train_df, test_df, plot_forecasts)
print('=' * 100)
perform_forecasting(df_pasta_train, df_pasta_test, linear_models_dict, plot_forecasts = True)
Performing forecasting for 1-year horizons ==================================================================================================== Model: multioutput linear regression MASE: 1.06
==================================================================================================== MASE: 0.70
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 1.53
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 1.78
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF Linear MASE: 0.88
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF DLinear MASE: 0.89
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF NLinear MASE: 0.96
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
====================================================================================================
<Figure size 640x480 with 0 Axes>
tune_lookback_window(df_energy_train, df_energy_test, 'Polish energy production')
Hyperparameter tuning for Polish energy production dataset
=============================================
Model: LTSF Linear
Best params: {'seq_len': 336}, MASE: 1.69
=============================================
Model: LTSF DLinear
Best params: {'seq_len': 336}, MASE: 1.66
=============================================
Model: LTSF NLinear
Best params: {'seq_len': 336}, MASE: 1.64
=============================================
lstf_linear = LTSFLinearForecaster(seq_len = 336, pred_len = df_energy_test.shape[0])
lstf_dlinear = LTSFDLinearForecaster(seq_len = 336, pred_len = df_energy_test.shape[0])
lstf_nlinear = LTSFNLinearForecaster(seq_len = 336, pred_len = df_energy_test.shape[0])
models = [multioutput_linear_regression, lstf_linear, lstf_dlinear, lstf_nlinear]
models_names = ['multioutput linear regression', 'LTSF Linear', 'LTSF DLinear', 'LTSF NLinear']
linear_models_dict = dict(zip(models_names, models))
perform_forecasting(df_energy_train, df_energy_test, linear_models_dict, plot_forecasts = True)
Performing forecasting for 1-year horizons ==================================================================================================== Model: multioutput linear regression MASE: 2.06
==================================================================================================== MASE: 5.81
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 2.03
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 1.62
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 20.78
<Figure size 640x480 with 0 Axes>
==================================================================================================== MASE: 6.20
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF Linear MASE: 1.79
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF DLinear MASE: 1.83
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
==================================================================================================== Model: LTSF NLinear MASE: 1.75
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
====================================================================================================
<Figure size 640x480 with 0 Axes>
MLP-based models (2 points)¶
Implement MLP-based neural models:
- N-BEATS
- TSMixer
For N-BEATS, use the interpretable architecture variant. If you want to tune hyperparameters, you can use e.g. automated class from neuralforecast with Ray or Optuna frameworks.
Training each model on each dataset is worth 0.5 points. Useful libraries: neuralforecast, PyTorch, pytorch-tsmixer.
Other interesting MLP-based models are e.g. N-HiTS, TiDE, TimeMixer, SOFTS. Each additional model is graded like models above.
Useful references:
- "N-BEATS: Neural basis expansion analysis for interpretable time series forecasting" B. Oreshkin et al.
- "TSMixer: An All-MLP Architecture for Time Series Forecasting" S. Chen et al.
- "N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting" C. Challu et al.
- "Long-term Forecasting with TiDE: Time-series Dense Encoder" A. Das et al.
- "TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting" S. Wang et al.
- "SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion" L. Han et al.
- neuralforecast forecasting models list
import optuna
from neuralforecast.auto import AutoNBEATS
from neuralforecast.core import NeuralForecast
from neuralforecast.losses.pytorch import MAE
import logging
logging.getLogger('pytorch_lightning').setLevel(logging.ERROR)
optuna.logging.set_verbosity(optuna.logging.WARNING)
def config_nbeats(trial):
return {
'max_steps': 200,
'input_size': 616,
'scaler_type': trial.suggest_categorical('scaler_type', ['minmax', 'robust', 'standard']),
'shared_weights': True,
'activation': trial.suggest_categorical('activation', ['ReLU', 'LeakyReLU']),
'batch_size': trial.suggest_categorical('batch_size', [256, 512]),
'windows_batch_size': trial.suggest_categorical('windows_batch_size', [128, 256, 512]),
'stack_types': trial.suggest_categorical('stack_types', [['trend', 'seasonality']]),
'n_blocks': [3, 3],
'mlp_units': [[512, 512, 512, 512], [512, 256, 256, 256]],
'n_polynomials': 2,
'random_seed': trial.suggest_int('random_seed', 1, 20),
'enable_progress_bar': False,
'enable_model_summary': False,
}
def perform_tuning_and_forecasting(model_name, config_name, df_train: pd.DataFrame, df_test: pd.DataFrame, text:str, plot_forecasts: bool) -> None:
print(f'Performing 1-year horizon forecasting for {text} dataset\n{"=" * 100}\nModel: {model_name}')
fh = df_test.shape[0]
if model_name in [AutoSOFTS, AutoTimeMixer, AutoTSMixer, AutoiTransformer]:
model = model_name(
h = fh,
n_series = df_test.shape[1],
loss = MAE(),
config = lambda trial: config_name(trial, n_series = df_test.shape[1]),
search_alg = optuna.samplers.TPESampler(),
backend = 'optuna',
num_samples = 20,
)
else:
model = model_name(
h = fh,
loss = MAE(),
config = config_name,
search_alg = optuna.samplers.TPESampler(),
backend = 'optuna',
num_samples = 20,
)
df_train = wide_to_long_df(df_train)
fcst = NeuralForecast(models = [model], freq = 'D')
fcst.fit(df = df_train, val_size = 2 * fh)
df_pred = fcst.predict()
df_train = long_to_wide_df(df_train)
df_pred = long_to_wide_df(df_pred)
mase = mean_absolute_scaled_error(y_true = df_test, y_pred = df_pred, y_train = df_train)
print(f'MASE: {mase:.2f}')
if plot_forecasts:
df = pd.concat([df_train, df_test])
for col in df_train.columns:
plot_series(df[col], df_pred[col], colors = ['royalblue', 'orangered'], labels = ['y', 'y_pred'], title = f'Predicted {col}')
plt.show()
plt.clf()
perform_tuning_and_forecasting(AutoNBEATS, config_nbeats, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 17
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoNBEATS'>
Seed set to 4 Seed set to 11 Seed set to 12 Seed set to 12 Seed set to 12 Seed set to 2 Seed set to 4 Seed set to 14 Seed set to 3 Seed set to 20 Seed set to 19 Seed set to 17 Seed set to 16 Seed set to 20 Seed set to 8 Seed set to 15 Seed set to 15 Seed set to 8 Seed set to 15 Seed set to 15
MASE: 0.87
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoNBEATS, config_nbeats, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 8
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoNBEATS'>
Seed set to 4 Seed set to 3 Seed set to 9 Seed set to 15 Seed set to 20 Seed set to 13 Seed set to 11 Seed set to 9 Seed set to 1 Seed set to 4 Seed set to 4 Seed set to 5 Seed set to 6 Seed set to 1 Seed set to 6 Seed set to 17 Seed set to 3 Seed set to 2 Seed set to 7 Seed set to 3
MASE: 2.91
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
from neuralforecast.auto import AutoTiDE
def config_tide(trial):
return {
'max_steps': 100,
'hidden_size': 256,
'input_size': trial.suggest_categorical('input_size', [96, 192, 336, 616]),
'early_stop_patience_steps': 5,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'num_encoder_layers': trial.suggest_categorical('num_encoder_layers', [1, 2, 3]),
'num_decoder_layers': trial.suggest_categorical('num_decoder_layers', [1, 2, 3]),
'decoder_output_dim': trial.suggest_categorical('decoder_output_dim', [4, 8, 16, 32]),
'temporal_decoder_dim': trial.suggest_categorical('temporal_decoder_dim', [32, 64, 128]),
'dropout': trial.suggest_categorical('dropout', [0.0, 0.1, 0.2, 0.3, 0.5]),
'layernorm': trial.suggest_categorical('layernorm', [True, False]),
'learning_rate': trial.suggest_categorical('learning_rate', [1e-5, 1e-2]),
'batch_size': 512,
'random_seed': trial.suggest_int('random_seed', 1, 20),
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoTiDE, config_tide, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 16
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTiDE'>
Seed set to 16 Seed set to 9 Seed set to 12 Seed set to 1 Seed set to 5 Seed set to 17 Seed set to 16 Seed set to 18 Seed set to 4 Seed set to 7 Seed set to 7 Seed set to 5 Seed set to 10 Seed set to 1 Seed set to 1 Seed set to 3 Seed set to 6 Seed set to 12 Seed set to 2 Seed set to 6
MASE: 0.87
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoTiDE, config_tide, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 15
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTiDE'>
Seed set to 1 Seed set to 20 Seed set to 9 Seed set to 2 Seed set to 9 Seed set to 11 Seed set to 16 Seed set to 19 Seed set to 6 Seed set to 15 Seed set to 15 Seed set to 15 Seed set to 17 Seed set to 18 Seed set to 18 Seed set to 12 Seed set to 12 Seed set to 12 Seed set to 6 Seed set to 12
MASE: 3.16
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
from neuralforecast.auto import AutoSOFTS
def config_softs(trial, n_series):
return {
'input_size': trial.suggest_categorical('input_size', [24, 48, 96, 192, 336, 512, 720]),
'hidden_size': trial.suggest_categorical('hidden_size', [32, 64, 128, 256, 512, 1024]),
'd_core': trial.suggest_categorical('d_core', [64, 128, 256, 512, 1024]),
'e_layers': trial.suggest_categorical('e_layers', [1, 2, 3, 4]),
'learning_rate': 0.0003,
'max_steps': 100,
'n_series': n_series,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'random_seed': trial.suggest_int('random_seed', 1, 20),
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoSOFTS, config_softs, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 20
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoSOFTS'>
Seed set to 14 Seed set to 8 Seed set to 13 Seed set to 8 Seed set to 6 Seed set to 19 Seed set to 12 Seed set to 6 Seed set to 19 Seed set to 2 Seed set to 1 Seed set to 1 Seed set to 1 Seed set to 3 Seed set to 4 Seed set to 4 Seed set to 4 Seed set to 10 Seed set to 4 Seed set to 3
MASE: 0.87
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoSOFTS, config_softs, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 4
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoSOFTS'>
Seed set to 4 Seed set to 10 Seed set to 6 Seed set to 6 Seed set to 19 Seed set to 20 Seed set to 17 Seed set to 1 Seed set to 1 Seed set to 13 Seed set to 13 Seed set to 12 Seed set to 10 Seed set to 15 Seed set to 9 Seed set to 9 Seed set to 8 Seed set to 8 Seed set to 6 Seed set to 6
MASE: 3.21
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
from neuralforecast.auto import AutoTSMixer
def config_tsmixer(trial, n_series):
return {
'input_size': trial.suggest_categorical('input_size', [24, 48, 96, 192, 336, 512, 720]),
'learning_rate': trial.suggest_categorical('learning_rate', [0.001, 0.0001]),
'n_block': trial.suggest_categorical('n_block', [1, 2, 4, 6, 8]),
'dropout': trial.suggest_categorical('dropout', [0.1, 0.3, 0.5, 0.7, 0.9]),
'max_steps': 200,
'n_series': n_series,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'random_seed': trial.suggest_int('random_seed', 1, 20),
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoTSMixer, config_tsmixer, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 11
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTSMixer'>
Seed set to 12 Seed set to 2 Seed set to 16 Seed set to 14 Seed set to 1 Seed set to 17 Seed set to 17 Seed set to 8 Seed set to 13 Seed set to 1 Seed set to 1 Seed set to 5 Seed set to 6 Seed set to 6 Seed set to 6 Seed set to 8 Seed set to 5 Seed set to 20 Seed set to 9 Seed set to 20
MASE: 0.88
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoTSMixer, config_tsmixer, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 20
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTSMixer'>
Seed set to 10 Seed set to 13 Seed set to 5 Seed set to 20 Seed set to 10 Seed set to 12 Seed set to 15 Seed set to 17 Seed set to 12 Seed set to 3 Seed set to 3 Seed set to 2 Seed set to 6 Seed set to 6 Seed set to 6 Seed set to 7 Seed set to 8 Seed set to 4 Seed set to 1 Seed set to 5
MASE: 2.74
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
from neuralforecast.auto import AutoTimeMixer
def config_timemixer(trial, n_series):
return {
'input_size': trial.suggest_categorical('input_size', [24, 48, 96, 192, 336, 512, 720]),
'learning_rate': trial.suggest_categorical('learning_rate', [0.1, 0.001]),
'batch_size': trial.suggest_categorical('batch_size', [8, 32, 128]),
'd_model': trial.suggest_categorical('d_model', [16, 32, 128]),
'e_layers': trial.suggest_categorical('e_layers', [2, 4, 5]),
'max_steps': 100,
'n_series': n_series,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'random_seed': trial.suggest_int('random_seed', 1, 20),
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoTimeMixer, config_timemixer, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 3
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTimeMixer'>
Seed set to 16 Seed set to 10 Seed set to 18 Seed set to 20 Seed set to 10 Seed set to 9 Seed set to 9 Seed set to 10 Seed set to 15 Seed set to 20 Seed set to 5 Seed set to 14 Seed set to 7 Seed set to 13 Seed set to 2 Seed set to 7 Seed set to 6 Seed set to 12 Seed set to 20 Seed set to 20
MASE: 0.86
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoTimeMixer, config_timemixer, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 10
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoTimeMixer'>
Seed set to 5 Seed set to 1 Seed set to 9 Seed set to 13 Seed set to 19 Seed set to 16 Seed set to 19 Seed set to 15 Seed set to 20 Seed set to 1 Seed set to 1 Seed set to 1 Seed set to 5 Seed set to 5 Seed set to 3 Seed set to 7 Seed set to 3 Seed set to 4 Seed set to 8 Seed set to 3
MASE: 2.88
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Time series transformers (2 points)¶
Implement a time series transformer, e.g. PatchTST.
You can use either pretrained variant, or train from scratch. If you want to tune hyperparameters, you can use e.g. automated class from neuralforecast with Ray or Optuna frameworks.
Training the model in any way is worth 2 points. You can also choose any other time series transformer, e.g. TFT, iTransformer, Autoformer. Useful libraries: neuralforecast, PyTorch, transformers, IBM Granite. Each model after the first one is worth 1 point. If you use PatchTST, using the pretrained one and training from scratch counts as two separate models.
Useful references:
- "A Time Series is Worth 64 Words: Long-term Forecasting with Transformers" Y. Nie et al.
- "Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting" B. Lim et al.
- "iTransformer: Inverted Transformers Are Effective for Time Series Forecasting" Y. Liu et al.
- "Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting" H. Wu et al.
- neuralforecast forecasting models list
- IBM Granite tutorial for pretrained PatchTST
from neuralforecast.auto import AutoPatchTST
def config_patchtst(trial):
return {
'input_size': trial.suggest_categorical('input_size', [24, 48, 96, 192, 336, 720]),
'encoder_layers': trial.suggest_categorical('encoder_layers', [3, 4, 5]),
'hidden_size': trial.suggest_categorical('hidden_size', [16, 128, 256]),
'linear_hidden_size': trial.suggest_categorical('linear_hidden_size', [128, 256, 512]),
'encoder_layers': 3,
'n_heads': trial.suggest_categorical('n_heads', [4, 16]),
'dropout': 0.2,
'patch_len': trial.suggest_categorical('patch_len', [16, 32, 64]),
'stride': trial.suggest_categorical('stride', [8, 16]),
'activation': 'gelu',
'random_seed': trial.suggest_int('random_seed', 1, 20),
'max_steps': 100,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'val_check_steps': 10,
'early_stop_patience_steps': 2,
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoPatchTST, config_patchtst, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 16
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoPatchTST'>
Seed set to 1 Seed set to 9 Seed set to 13 Seed set to 19 Seed set to 15 Seed set to 5 Seed set to 13 Seed set to 3 Seed set to 8 Seed set to 1 Seed set to 20 Seed set to 20 Seed set to 19 Seed set to 18 Seed set to 17 Seed set to 20 Seed set to 13 Seed set to 15 Seed set to 10 Seed set to 20
MASE: 0.86
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoPatchTST, config_patchtst, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 16
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoPatchTST'>
Seed set to 8 Seed set to 17 Seed set to 7 Seed set to 2 Seed set to 20 Seed set to 7 Seed set to 7 Seed set to 2 Seed set to 13 Seed set to 14 Seed set to 14 Seed set to 15 Seed set to 19 Seed set to 20 Seed set to 17 Seed set to 11 Seed set to 11 Seed set to 10 Seed set to 11 Seed set to 16
MASE: 3.44
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
from neuralforecast.auto import AutoiTransformer
def config_itransformer(trial, n_series):
return {
'input_size': trial.suggest_categorical('input_size', [24, 48, 96, 192, 336, 720]),
'n_series': n_series,
'learning_rate': trial.suggest_categorical('learning_rate', [0.0001, 0.0003, 0.0005, 0.001]),
'e_layers': trial.suggest_categorical('e_layers', [1, 2, 3, 4]),
'd_layers': trial.suggest_categorical('d_layers', [1, 2, 3, 4]),
'hidden_size': trial.suggest_categorical('hidden_size', [64, 256, 512, 1024]),
'random_seed': trial.suggest_int('random_seed', 1, 20),
'max_steps': 200,
'n_series': n_series,
'scaler_type': trial.suggest_categorical('scaler_type', ['identity', 'minmax', 'robust', 'standard']),
'enable_progress_bar': False,
'enable_model_summary': False,
}
perform_tuning_and_forecasting(AutoiTransformer, config_itransformer, df_pasta_train, df_pasta_test, 'Italian pasta', plot_forecasts = True)
Seed set to 5
Performing 1-year horizon forecasting for Italian pasta dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoiTransformer'>
Seed set to 7 Seed set to 17 Seed set to 17 Seed set to 20 Seed set to 6 Seed set to 17 Seed set to 3 Seed set to 6 Seed set to 7 Seed set to 11 Seed set to 11 Seed set to 11 Seed set to 9 Seed set to 1 Seed set to 1 Seed set to 13 Seed set to 1 Seed set to 9 Seed set to 14 Seed set to 9
MASE: 0.87
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
perform_tuning_and_forecasting(AutoiTransformer, config_itransformer, df_energy_train, df_energy_test, 'Polish energy production', plot_forecasts = True)
Seed set to 6
Performing 1-year horizon forecasting for Polish energy production dataset ==================================================================================================== Model: <class 'neuralforecast.auto.AutoiTransformer'>
Seed set to 15 Seed set to 15 Seed set to 17 Seed set to 7 Seed set to 5 Seed set to 8 Seed set to 5 Seed set to 10 Seed set to 6 Seed set to 1 Seed set to 2 Seed set to 1 Seed set to 1 Seed set to 3 Seed set to 11 Seed set to 13 Seed set to 12 Seed set to 19 Seed set to 13 Seed set to 13
MASE: 3.67
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Pretrained foundation models (2 points)¶
Use a pretrained time series foundation model for zero-shot forecasting.
Examples are e.g. TimesFM, Lag-Llama, TimeGPT, Moirai. Model notes:
- TimesFM - using the PyTorch version of original library is suggested
- Lag-Llama - this is a probabilistic model, note that we are interested in point forecasts (mean probabilistic value)
- TimeGPT - as this is a proprietary model, you need to provide the API token, make sure you don't push it to a public repository!
The first model is worth 2 points, and subsequent ones are worth 1 point each.
Useful references:
- "A decoder-only foundation model for time-series forecasting" A. Das et al.
- TimesFM repository
- "Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting" K. Rasul et al.
- Lag-Llama repository
- "TimeGPT-1" A. Garza et al.
- TimeGPT docs
- "Unified Training of Universal Time Series Forecasting Transformers" G. Woo et al.
- HuggingFace Moirai model page